Overview

Dataset Statistics

Number of Variables 17
Number of Rows 1.7164e+06
Missing Cells 3.9399e+06
Missing Cells (%) 13.5%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 512.1 MB
Average Row Size in Memory 312.9 B
Variable Types
  • Numerical: 14
  • Categorical: 3

Dataset Insights

CREDIT_DAY_OVERDUE and CNT_CREDIT_PROLONG have similar distributions Similar Distribution
CREDIT_DAY_OVERDUE and AMT_CREDIT_SUM_OVERDUE have similar distributions Similar Distribution
CNT_CREDIT_PROLONG and AMT_CREDIT_SUM_OVERDUE have similar distributions Similar Distribution
DAYS_CREDIT_ENDDATE has 105553 (6.15%) missing values Missing
DAYS_ENDDATE_FACT has 633653 (36.92%) missing values Missing
AMT_CREDIT_MAX_OVERDUE has 1124488 (65.51%) missing values Missing
AMT_CREDIT_SUM_DEBT has 257669 (15.01%) missing values Missing
AMT_CREDIT_SUM_LIMIT has 591780 (34.48%) missing values Missing
AMT_ANNUITY has 1226791 (71.47%) missing values Missing
CREDIT_DAY_OVERDUE is skewed Skewed
DAYS_CREDIT_ENDDATE is skewed Skewed
DAYS_ENDDATE_FACT is skewed Skewed
AMT_CREDIT_MAX_OVERDUE is skewed Skewed
CNT_CREDIT_PROLONG is skewed Skewed
AMT_CREDIT_SUM is skewed Skewed
AMT_CREDIT_SUM_DEBT is skewed Skewed
AMT_CREDIT_SUM_LIMIT is skewed Skewed
AMT_CREDIT_SUM_OVERDUE is skewed Skewed
DAYS_CREDIT_UPDATE is skewed Skewed
AMT_ANNUITY is skewed Skewed
CREDIT_CURRENCY has constant length 10 Constant Length
DAYS_CREDIT has 1716403 (100.0%) negatives Negatives
DAYS_CREDIT_ENDDATE has 1007389 (58.69%) negatives Negatives
DAYS_ENDDATE_FACT has 1082711 (63.08%) negatives Negatives
DAYS_CREDIT_UPDATE has 1715806 (99.96%) negatives Negatives
CREDIT_DAY_OVERDUE has 1712211 (99.75%) zeros Zeros
AMT_CREDIT_MAX_OVERDUE has 470650 (27.42%) zeros Zeros
CNT_CREDIT_PROLONG has 1707314 (99.47%) zeros Zeros
AMT_CREDIT_SUM_DEBT has 1016434 (59.22%) zeros Zeros
AMT_CREDIT_SUM_LIMIT has 1050142 (61.18%) zeros Zeros
AMT_CREDIT_SUM_OVERDUE has 1712270 (99.76%) zeros Zeros
AMT_ANNUITY has 256915 (14.97%) zeros Zeros
  • 1
  • 2
  • 3
  • 4

Variables


SK_ID_CURR

numerical

Approximate Distinct Count 305811
Approximate Unique (%) 17.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 27462848
Mean 278214.9336
Minimum 100001
Maximum 456255
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • SK_ID_CURR is skewed right (γ1 = 0.0011)

Quantile Statistics

Minimum 100001
5-th Percentile 118239.2
Q1 189771.25
Median 279622.5
Q3 368177
95-th Percentile 439277.2
Maximum 456255
Range 356254
IQR 178405.75

Descriptive Statistics

Mean 278214.9336
Standard Deviation 102938.5581
Variance 1.0596e+10
Sum 4.7754e+11
Skewness 0.001063
Kurtosis -1.2028
Coefficient of Variation 0.37

SK_ID_BUREAU

numerical

Approximate Distinct Count 1716428
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 27462848
Mean 5.9244e+06
Minimum 5000000
Maximum 6843457
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • SK_ID_BUREAU is skewed left (γ1 = -0.0075)

Quantile Statistics

Minimum 5000000
5-th Percentile 5.095e+06
Q1 5.4624e+06
Median 5.9266e+06
Q3 6.3876e+06
95-th Percentile 6.7582e+06
Maximum 6843457
Range 1843457
IQR 925193.05

Descriptive Statistics

Mean 5.9244e+06
Standard Deviation 532265.7286
Variance 2.8331e+11
Sum 1.0169e+13
Skewness -0.007498
Kurtosis -1.199
Coefficient of Variation 0.08984

CREDIT_ACTIVE

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 121853376
  • The largest value (Closed) is over 1.71 times larger than the second largest value (Active)

Length

Mean 5.9924
Standard Deviation 0.1233
Median 6
Minimum 4
Maximum 8

Sample

1st row Closed
2nd row Active
3rd row Active
4th row Active
5th row Active

Letter

Count 10285535
Lowercase Letter 8569107
Space Separator 21
Uppercase Letter 1716428
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Closed, Active) take over 50.0%
  • The largest value (closed) is over 1.71 times larger than the second largest value (active)

CREDIT_CURRENCY

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 128732100
  • The largest value (currency 1) is over 1401.16 times larger than the second largest value (currency 2)

Length

Mean 10
Standard Deviation 0
Median 10
Minimum 10
Maximum 10

Sample

1st row currency 1
2nd row currency 1
3rd row currency 1
4th row currency 1
5th row currency 1

Letter

Count 13731424
Lowercase Letter 13731424
Space Separator 1716428
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 1716428
  • The top 2 categories (currency 1, currency 2) take over 50.0%
  • CREDIT_CURRENCY has words of constant length

DAYS_CREDIT

numerical

Approximate Distinct Count 2923
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 27462848
Mean -1142.1077
Minimum -2922
Maximum 0
Zeros 25
Zeros (%) 0.0%
Negatives 1716403
Negatives (%) 100.0%
  • DAYS_CREDIT is skewed left (γ1 = -0.5823)

Quantile Statistics

Minimum -2922
5-th Percentile -2656
Q1 -1650
Median -976
Q3 -464
95-th Percentile -122
Maximum 0
Range 2922
IQR 1186

Descriptive Statistics

Mean -1142.1077
Standard Deviation 795.1649
Variance 632287.263
Sum -1.9603e+09
Skewness -0.5823
Kurtosis -0.7354
Coefficient of Variation -0.6962
  • DAYS_CREDIT is not normally distributed (p-value 5.1344480804425355e-05)

CREDIT_DAY_OVERDUE

numerical

Approximate Distinct Count 942
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 27462848
Mean 0.8182
Minimum 0
Maximum 2792
Zeros 1712211
Zeros (%) 99.8%
Negatives 0
Negatives (%) 0.0%
  • CREDIT_DAY_OVERDUE is skewed right (γ1 = 55.931)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 2792
Range 2792
IQR 0

Descriptive Statistics

Mean 0.8182
Standard Deviation 36.5444
Variance 1335.4952
Sum 1.4043e+06
Skewness 55.931
Kurtosis 3374.4743
Coefficient of Variation 44.6662
  • CREDIT_DAY_OVERDUE is not normally distributed (p-value 4.226533655913789e-25)

DAYS_CREDIT_ENDDATE

numerical

Approximate Distinct Count 14096
Approximate Unique (%) 0.9%
Missing 105553
Missing (%) 6.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 25774000
Mean 510.5174
Minimum -42060
Maximum 31199
Zeros 883
Zeros (%) 0.1%
Negatives 1007389
Negatives (%) 58.7%
  • DAYS_CREDIT_ENDDATE is skewed right (γ1 = 5.1271)

Quantile Statistics

Minimum -42060
5-th Percentile -2247
Q1 -1130
Median -328
Q3 495
95-th Percentile 2984
Maximum 31199
Range 73259
IQR 1625

Descriptive Statistics

Mean 510.5174
Standard Deviation 4994.2198
Variance 2.4942e+07
Sum 8.2238e+08
Skewness 5.1271
Kurtosis 28.1802
Coefficient of Variation 9.7827
  • DAYS_CREDIT_ENDDATE is not normally distributed (p-value 2.4773622762486178e-18)
  • DAYS_CREDIT_ENDDATE has 79131 outliers

DAYS_ENDDATE_FACT

numerical

Approximate Distinct Count 2917
Approximate Unique (%) 0.3%
Missing 633653
Missing (%) 36.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 17324400
Mean -1017.4371
Minimum -42023
Maximum 0
Zeros 64
Zeros (%) 0.0%
Negatives 1082711
Negatives (%) 63.1%
  • DAYS_ENDDATE_FACT is skewed left (γ1 = -0.7748)

Quantile Statistics

Minimum -42023
5-th Percentile -2381
Q1 -1486
Median -894
Q3 -418.04
95-th Percentile -92
Maximum 0
Range 42023
IQR 1067.96

Descriptive Statistics

Mean -1017.4371
Standard Deviation 714.0106
Variance 509811.1745
Sum -1.1017e+09
Skewness -0.7748
Kurtosis 9.4091
Coefficient of Variation -0.7018
  • DAYS_ENDDATE_FACT is not normally distributed (p-value 5.101967812181298e-18)
  • DAYS_ENDDATE_FACT has 1 outliers

AMT_CREDIT_MAX_OVERDUE

numerical

Approximate Distinct Count 68251
Approximate Unique (%) 11.5%
Missing 1124488
Missing (%) 65.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 9471040
Mean 3825.4177
Minimum 0
Maximum 1.1599e+08
Zeros 470650
Zeros (%) 27.4%
Negatives 0
Negatives (%) 0.0%
  • AMT_CREDIT_MAX_OVERDUE is skewed right (γ1 = 470.9126)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 14550.8962
Maximum 1.1599e+08
Range 1.1599e+08
IQR 0

Descriptive Statistics

Mean 3825.4177
Standard Deviation 206031.6062
Variance 4.2449e+10
Sum 2.2644e+09
Skewness 470.9126
Kurtosis 245694.8495
Coefficient of Variation 53.8586
  • AMT_CREDIT_MAX_OVERDUE is not normally distributed (p-value 4.226514678889686e-25)
  • AMT_CREDIT_MAX_OVERDUE has 121290 outliers

CNT_CREDIT_PROLONG

numerical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 27462848
Mean 0.00641
Minimum 0
Maximum 9
Zeros 1707314
Zeros (%) 99.5%
Negatives 0
Negatives (%) 0.0%
  • CNT_CREDIT_PROLONG is skewed right (γ1 = 20.3193)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 9
Range 9
IQR 0

Descriptive Statistics

Mean 0.00641
Standard Deviation 0.09622
Variance 0.009259
Sum 11003
Skewness 20.3193
Kurtosis 615.437
Coefficient of Variation 15.0106
  • CNT_CREDIT_PROLONG is not normally distributed (p-value 4.231246084362703e-25)

AMT_CREDIT_SUM

numerical

Approximate Distinct Count 236708
Approximate Unique (%) 13.8%
Missing 13
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 27462640
Mean 354994.5919
Minimum 0
Maximum 5.85e+08
Zeros 66582
Zeros (%) 3.9%
Negatives 0
Negatives (%) 0.0%
  • AMT_CREDIT_SUM is skewed right (γ1 = 124.586)

Quantile Statistics

Minimum 0
5-th Percentile 12276.675
Q1 52752.735
Median 126000
Q3 315000
95-th Percentile 1.35e+06
Maximum 5.85e+08
Range 5.85e+08
IQR 262247.265

Descriptive Statistics

Mean 354994.5919
Standard Deviation 1.1498e+06
Variance 1.3221e+12
Sum 6.0932e+11
Skewness 124.586
Kurtosis 49315.8235
Coefficient of Variation 3.239
  • AMT_CREDIT_SUM is not normally distributed (p-value 4.22663142924758e-25)
  • AMT_CREDIT_SUM has 188325 outliers

AMT_CREDIT_SUM_DEBT

numerical

Approximate Distinct Count 226537
Approximate Unique (%) 15.5%
Missing 257669
Missing (%) 15.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 23340144
Mean 137085.12
Minimum -4.7056e+06
Maximum 1.701e+08
Zeros 1016434
Zeros (%) 59.2%
Negatives 8418
Negatives (%) 0.5%
  • AMT_CREDIT_SUM_DEBT is skewed right (γ1 = 36.4145)

Quantile Statistics

Minimum -4.7056e+06
5-th Percentile 0
Q1 0
Median 0
Q3 42939
95-th Percentile 647532.225
Maximum 1.701e+08
Range 1.7481e+08
IQR 42939

Descriptive Statistics

Mean 137085.12
Standard Deviation 677401.131
Variance 4.5887e+11
Sum 1.9997e+11
Skewness 36.4145
Kurtosis 5673.4148
Coefficient of Variation 4.9415
  • AMT_CREDIT_SUM_DEBT is not normally distributed (p-value 4.239649429044821e-25)
  • AMT_CREDIT_SUM_DEBT has 272879 outliers

AMT_CREDIT_SUM_LIMIT

numerical

Approximate Distinct Count 51726
Approximate Unique (%) 4.6%
Missing 591780
Missing (%) 34.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 17994368
Mean 6229.515
Minimum -586406.115
Maximum 4.7056e+06
Zeros 1050142
Zeros (%) 61.2%
Negatives 351
Negatives (%) 0.0%
  • AMT_CREDIT_SUM_LIMIT is skewed right (γ1 = 18.0269)

Quantile Statistics

Minimum -586406.115
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 6622.425
Maximum 4.7056e+06
Range 5.292e+06
IQR 0

Descriptive Statistics

Mean 6229.515
Standard Deviation 45032.0315
Variance 2.0279e+09
Sum 7.006e+09
Skewness 18.0269
Kurtosis 796.0925
Coefficient of Variation 7.2288
  • AMT_CREDIT_SUM_LIMIT is not normally distributed (p-value 4.303887710634066e-25)
  • AMT_CREDIT_SUM_LIMIT has 74506 outliers

AMT_CREDIT_SUM_OVERDUE

numerical

Approximate Distinct Count 1616
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 27462848
Mean 37.9128
Minimum 0
Maximum 3.7567e+06
Zeros 1712270
Zeros (%) 99.8%
Negatives 0
Negatives (%) 0.0%
  • AMT_CREDIT_SUM_OVERDUE is skewed right (γ1 = 403.2415)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 3.7567e+06
Range 3.7567e+06
IQR 0

Descriptive Statistics

Mean 37.9128
Standard Deviation 5937.65
Variance 3.5256e+07
Sum 6.5075e+07
Skewness 403.2415
Kurtosis 211836.2325
Coefficient of Variation 156.6135
  • AMT_CREDIT_SUM_OVERDUE is not normally distributed (p-value 4.22651434546084e-25)

CREDIT_TYPE

categorical

Approximate Distinct Count 15
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 135355782
  • The largest value (Consumer credit) is over 3.11 times larger than the second largest value (Credit card)

Length

Mean 13.859
Standard Deviation 2.1036
Median 15
Minimum 8
Maximum 44

Sample

1st row Consumer credit
2nd row Credit card
3rd row Consumer credit
4th row Credit card
5th row Consumer credit

Letter

Count 22093481
Lowercase Letter 20377053
Space Separator 1694305
Uppercase Letter 1716428
Dash Punctuation 56
Decimal Number 0
  • The top 2 categories (Consumer credit, Credit card) take over 50.0%

DAYS_CREDIT_UPDATE

numerical

Approximate Distinct Count 2982
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 27462848
Mean -593.7483
Minimum -41947
Maximum 372
Zeros 605
Zeros (%) 0.0%
Negatives 1715806
Negatives (%) 100.0%
  • DAYS_CREDIT_UPDATE is skewed left (γ1 = -11.335)

Quantile Statistics

Minimum -41947
5-th Percentile -2043
Q1 -900
Median -383
Q3 -32
95-th Percentile -8
Maximum 372
Range 42319
IQR 868

Descriptive Statistics

Mean -593.7483
Standard Deviation 720.7473
Variance 519476.687
Sum -1.0191e+09
Skewness -11.335
Kurtosis 596.3719
Coefficient of Variation -1.2139
  • DAYS_CREDIT_UPDATE is not normally distributed (p-value 1.0074569443119505e-19)
  • DAYS_CREDIT_UPDATE has 66529 outliers

AMT_ANNUITY

numerical

Approximate Distinct Count 40321
Approximate Unique (%) 8.2%
Missing 1226791
Missing (%) 71.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 7834192
Mean 15712.7577
Minimum 0
Maximum 1.1845e+08
Zeros 256915
Zeros (%) 15.0%
Negatives 0
Negatives (%) 0.0%
  • AMT_ANNUITY is skewed right (γ1 = 212.5425)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 13729.5
95-th Percentile 47459.1375
Maximum 1.1845e+08
Range 1.1845e+08
IQR 13729.5

Descriptive Statistics

Mean 15712.7577
Standard Deviation 325826.9491
Variance 1.0616e+11
Sum 7.6935e+09
Skewness 212.5425
Kurtosis 58560.096
Coefficient of Variation 20.7365
  • AMT_ANNUITY is not normally distributed (p-value 4.226521148357763e-25)
  • AMT_ANNUITY has 42915 outliers

Interactions

Correlations

Missing Values